Search CORE

70 research outputs found

Factorizing Probabilistic Graphical Models Using Co-occurrence Rate

Author: Zhu Zhemin
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Factorization is of fundamental importance in the area of Probabilistic Graphical Models (PGMs). In this paper, we theoretically develop a novel mathematical concept, \textbf{C}o-occurrence \textbf{R}ate (CR), for factorizing PGMs. CR has three obvious advantages: (1) CR provides a unified mathematical foundation for factorizing different types of PGMs. We show that Bayesian Network Factorization (BN-F), Conditional Random Field Factorization (CRF-F), Markov Random Field Factorization (MRF-F) and Refined Markov Random Field Factorization (RMRF-F) are all special cases of CR Factorization (CR-F); (2) CR has simple probability definition and clear intuitive interpretation. CR-F tells not only the scopes of the factors, but also the exact probability functions of these factors; (3) CR connects probability factorization and graph operations perfectly. The factorization process of CR-F can be visualized as applying a sequence of graph operations including partition, merge, duplicate and condition to a PGM graph. We further obtain an important result: by CR-F, on TCG graphs the scopes of factors can be exactly over maximal cliques without any default configuration. This improves the results of (R)MRF-F which need default configurations, and also indicates that (R)MRF-F, as special cases of CR-F, can not always achieve the optimal results of CR-F

University of Twente Research Information

Concept Extraction Challenge: University of Twente at #MSM2013

Author: Habib Mena B.
Keulen Maurice van
Zhu Zhemin
Publication venue: CEUR
Publication date: 01/01/2013
Field of study

Twitter messages are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. In this paper we present a hybrid approach for Named Entity Extraction (NEE) and Classification (NEC) for tweets. The system uses the power of the Conditional Random Fields (CRF) and the Support Vector Machines (SVM) in a hybrid way to achieve better results. For named entity type classification we used AIDA \cite{YosefHBSW11} disambiguation system to disambiguate the extracted named entities and hence find their type

Maastricht University Research Portal

CiteSeerX

University of Twente Research Information

Empirical co-occurrence rate networks for sequence labeling

Author: Apers Peter
Hiemstra Djoerd
Wombacher Andreas
Zhu Zhemin
Publication venue: Erasmus University Rotterdam
Publication date: 01/01/2013
Field of study

Sequence labeling has wide applications in many areas. For example, most of named entity recog- nition tasks, which extract named entities or events from unstructured data, can be formalized as sequence labeling problems. Sequence labeling has been studied extensively in different commu- nities, such as data mining, natural language processing or machine learning. Many powerful and popular models have been developed, such as hidden Markov models (HMMs) [4], conditional Markov models (CMMs) [3], and conditional random fields (CRFs) [2]. Despite their successes, they suffer from some known problems: (i) HMMs are generative models which suffer from the mismatch problem, and also it is difficult to incorporate overlapping, non-independent features into a HMM explicitly. (ii) CMMs suffer from the label bias problem; (iii) CRFs overcome the problems of HMMs and CMMs, but the global normalization of CRFs can be very expensive. This prevents CRFs from being applied to big datasets (e.g. Tweets).\ud In this paper, we propose the empirical Co-occurrence Rate Networks (ECRNs) [5] for sequence la- beling. CRNs avoid the problems of the existing models mentioned above. To make the training of CRNs as efficient as possible, we simply use the empirical distribution as the parameter estimation. This results in the ECRNs which can be trained orders of magnitude faster and still obtain compet- itive accuracy to the existing models. ECRN has been applied as a component to the University of Twente system [1] for concept extraction challenge at #MSM2013, which won the best challenge submission awards. ECRNs can be very useful for practitioners on big data

CiteSeerX

Radboud Repository

University of Twente Research Information

Named Entity Extraction and Linking Challenge: University of Twente at #Microposts2014

Author: Habib Mena B.
Keulen Maurice van
Zhu Zhemin
Publication venue: CEUR-WS.org
Publication date: 01/01/2014
Field of study

Twitter is a potentially rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing (NLP) tasks. In this paper, we present a hybrid approach for Named Entity Extraction (NEE)and Linking (NEL) for tweets. Although NEE and NEL are two topics that are well studied in literature, almost all approaches treated the two problems separately. We believe that disambiguation (linking) could help improving the extraction process. We call this potential for mutual improvement, the reinforcement effect. It mimics the way humans understand natural language. Furthermore, our proposed approaches handles uncertainties involved in the two processes by considering possible alternatives

CiteSeerX

Maastricht University Research Portal

University of Twente Research Information

Separate Training for Conditional Random Fields Using Co-occurrence Rate Factorization

Author: Apers Peter
Hiemstra Djoerd
Wombacher Andreas
Zhu Zhemin
Publication venue
Publication date: 01/01/2012
Field of study

The standard training method of Conditional Random Fields (CRFs) is very slow for large-scale applications. As an alternative, piecewise training divides the full graph into pieces, trains them independently, and combines the learned weights at test time. In this paper, we present \emph{separate} training for undirected models based on the novel Co-occurrence Rate Factorization (CR-F). Separate training is a local training method. In contrast to MEMMs, separate training is unaffected by the label bias problem. Experiments show that separate training (i) is unaffected by the label bias problem; (ii) reduces the training time from weeks to seconds; and (iii) obtains competitive results to the standard and piecewise training on linear-chain CRFs.Comment: 10page

arXiv.org e-Print Archive

CiteSeerX

Radboud Repository

University of Twente Research Information

Mechanism-based site-directed mutagenesis to shift the optimum pH of the phenylalanine ammonia-lyase from Rhodotorula glutinis JN-1

Author: Cui Wenjing
Liu Zhongmei
Zhou Li
Zhou Zhemin
Zhu Longbao
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 30/09/2014
Field of study

AbstractPhenylalanine ammonia-lyase (RgPAL) from Rhodotorula glutinis JN-1 stereoselectively catalyzes the conversion of the l-phenylalanine into trans-cinnamic acid and ammonia, and was used in chiral resolution of dl-phenylalanine to produce the d-phenylalanine under acidic condition. However, the optimum pH of RgPAL is 9 and the RgPAL exhibits low catalytic efficiency at acidic side. Therefore, a mutant RgPAL with a lower optimum pH is expected. Based on catalytic mechanism and structure analysis, we constructed a mutant RgPAL-Q137E by site-directed mutagenesis, and found that this mutant had an extended optimum pH 7–9 with activity of 1.8-fold higher than that of the wild type at pH 7. As revealed by Friedel–Crafts-type mechanism of RgPAL, the improvement of the RgPAL-Q137E might be due to the negative charge of Glu137 which could stabilize the intermediate transition states through electrostatic interaction. The RgPAL-Q137E mutant was used to resolve the racemic dl-phenylalanine, and the conversion rate and the eeD value of d-phenylalanine using RgPAL-Q137E at pH 7 were increased by 29% and 48%, and achieved 93% and 86%, respectively. This work provides an effective strategy to shift the optimum pH which is favorable to further applications of RgPAL

Elsevier - Publisher Connector

Directory of Open Access Journals

Empirical training for conditional random fields

Author: Apers Peter M.G.
Hiemstra Djoerd
Wombacher Andreas
Zhu Zhemin
Publication venue: Radboud University
Publication date: 03/06/2013
Field of study

In this paper (Zhu et al., 2013), we present a practi- cally scalable training method for CRFs called Empir- ical Training (EP). We show that the standard train- ing with unregularized log likelihood can have many maximum likelihood estimations (MLEs). Empirical training has a unique closed form MLE which can be calculated from the empirical distribution very fast. The MLE of the empirical training is also one MLE of the standard training. So empirical training can be competitive in precision to the standard training and piecewise training. And also we show that empirical training is unaffected by the label bias problem even it is a local normalized model. Experiments on two real- world NLP datasets also show that empirical training reduces the training time from weeks to seconds, and obtains competitive results to the standard and piece- wise training on linear-chain CRFs, especially when training data are insufficient

University of Twente Research Information

A Short Note on Aberrant Responses Bias in Item Response Theory

Author: Bing Jia
Xue Zhang
Zhemin Zhu
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Item response models often cannot calculate true individual response probabilities because of the existence of response disturbances (such as guessing and cheating). Many studies on aberrant responses under item response theory (IRT) framework had been conducted. Some of them focused on how to reduce the effect of aberrant responses, and others focused on how to detect aberrant examinees, such as person fit analysis. The purpose of this research was to derive a generalized formula of bias with/without aberrant responses, that showed the effect of both non-aberrant and aberrant response data on the bias of capability estimation mathematically. A new evaluation criterion, named aberrant absolute bias (|ABIAS|), was proposed to detect aberrant examinees. Simulation studies and application to a real dataset were conducted to demonstrate the efficiency and the utility of |ABIAS|

Directory of Open Access Journals

//Rondje Zilverling: COMMIT/TimeTrails

Author: By Rolf de
Flokstra Jan
Graaff Victor de
Keulen Maurice van
Wombacher Andreas
Zhu Zhemin
Publication venue: Inter-Actief, University of Twente
Publication date: 01/01/2013
Field of study

Het TimeTrails-project3 gaat over data mining in grote hoeveelheden gegevens over gebeurtenissen in ruimte en tijd, d.w.z. met coördinaten en time-stamps. Dergelijke gegevens worden doorgaans vergaard door mensen, sensoren en wetenschappelijke observaties. Gegevensanalyse richt zich vaak op de vier W’s: Wie, Wat, Waar en Wanneer. Een belangrijke kwestie is het kunnen behappen van de grote hoeveelheden gegevens, d.w.z. "big data". Vanuit de UT werken we, d.w.z. de groepen EWI/DB en ITC/GIP, aan twee applicaties:\ud * Het in kaart brengen van de mening van het publiek bij grote infrastructuurproject zoals de aanleg van een nieuw stuk snelweg. Dit doen we met Twitter-analyse en data-visualisatie.\ud • Het vinden van goede vakantiebestemmingen. Hierbij spelen Social media, web harvesting en analyse van GPS-traces een rol

University of Twente Research Information